Scope fastokens adaptation to individual tokenizers#91
Conversation
ApprovabilityVerdict: Needs human review This PR refactors the fastokens integration from process-wide patching to per-tokenizer adaptation, changing how core tokenizer loading infrastructure operates. While the change simplifies code and improves isolation, modifications to how this optimization is applied warrant human review. You can customize Macroscope's approvability policy. Learn more. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7a670c50a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # restore the prior patch state. Never cache a non-offset tokenizer. | ||
| # This path deliberately loads through vanilla Transformers because | ||
| # fastokens adaptation is scoped to ``load_tokenizer``'s returned object. | ||
| offset_tok = _load_tokenizer_via_auto(load_name_or_path, **kwargs) |
There was a problem hiding this comment.
Restore unpatched offset reloads under global fastokens
When the host process has already called fastokens.patch_transformers() (for example because another serving stack enables it globally), this reload is not vanilla: _load_tokenizer_via_auto() returns another fastokens shim, _has_offsets() stays false, and hand-coded renderers that call attribute_text_segments() raise instead of using the offset cache. The deleted fallback used to temporarily unpatch in exactly this case, so this path still needs to force the offset tokenizer reload through unpatched Transformers.
Useful? React with 👍 / 👎.
Overview
Load tokenizers through vanilla Transformers first, then replace only the returned tokenizer's backend with the same fastokens compatibility shim used by the global patch path.
This removes the process-wide patch window from
renderers.load_tokenizerwhile preserving the fastokens-backed tokenizer returned to renderer callers.Why
The previous path called
fastokens.patch_transformers()around the completeAutoTokenizer.from_pretrainedoperation. Tokenizer loading can take hundreds of milliseconds or longer, and the patch mutates Transformers classes and decoder state process-wide for that entire interval. Unrelated tokenizer loads running concurrently could therefore receive fastokens unexpectedly, including its lack of offset mappings.Per-instance adaptation keeps Transformers globals vanilla throughout the load. Concurrent renderer-pool slots still receive independent fastokens backends, while environment, harness, and application tokenizer calls remain unaffected.
Behavior
use_fastokens=False, trusted revision pins, tokenizer mirrors, and renderer auto-resolution retain their existing behavior.This narrower tokenizer ownership contract allows async consumers such as Verifiers to initialize renderer pools in worker threads without exposing temporary Transformers mutations to other coroutines.
Note
Medium Risk
Touches the default tokenizer load path used by renderer pools (performance and thread-safety contract), but behavior is intentionally preserved with added concurrency coverage.
Overview
Replaces process-wide
fastokens.patch_transformers()with per-instance backend adaptation inload_tokenizer: tokenizers load through vanillaAutoTokenizerfirst, then only the returned object's_tokenizeris wrapped withfastokens._compat._TokenizerShim.This removes the global patch window (and related stdout suppression / patch lock) that could affect concurrent
AutoTokenizer.from_pretrainedcalls during slow loads. Encode/decode parity,FASTOKENS_INCOMPATIBLE,use_fastokens=False, and trusted-revision behavior stay the same; adaptation failures now keep the already-loaded vanilla tokenizer instead of reloading.Offset-tokenizer loading in
_get_offset_tokenizerdrops the patch/unpatch race workaround and always uses vanilla Transformers, since fastokens no longer mutates globals.Tests add a concurrent “slow load” case and refresh wording from “patch” to “adaptation.”
Reviewed by Cursor Bugbot for commit 7a670c5. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Scope fastokens adaptation to individual tokenizers instead of globally patching transformers
fastokens.patch_transformers()/ unpatch calls with per-tokenizer backend swapping via a new_adapt_tokenizer_with_fastokensfunction in renderers/base.py.load_tokenizernow loads via vanillaAutoTokenizerfirst, then swaps only the returned tokenizer's_tokenizerbackend to the fastokens_TokenizerShim; otherAutoTokenizercalls elsewhere in the process are unaffected._get_offset_tokenizerno longer toggles any fastokens global state; it loads a vanilla fast tokenizer directly._FASTOKENS_ANNOUNCE_LOCK) replaces the previous per-load stdout redirect.AutoTokenizer.from_pretrainedcalls made whileload_tokenizeris in progress now always get a vanilla tokenizer, eliminating the previous race window introduced by global patching.Macroscope summarized 7a670c5.